A Novel Accuracy and Similarity Search Structure Based on Parallel Bloom Filters

نویسندگان

  • Chunyan Shuai
  • Hengcheng Yang
  • Xin Ouyang
  • Siqi Li
  • Zheng Chen
چکیده

In high-dimensional spaces, accuracy and similarity search by low computing and storage costs are always difficult research topics, and there is a balance between efficiency and accuracy. In this paper, we propose a new structure Similar-PBF-PHT to represent items of a set with high dimensions and retrieve accurate and similar items. The Similar-PBF-PHT contains three parts: parallel bloom filters (PBFs), parallel hash tables (PHTs), and a bitmatrix. Experiments show that the Similar-PBF-PHT is effective in membership query and K-nearest neighbors (K-NN) search. With accurate querying, the Similar-PBF-PHT owns low hit false positive probability (FPP) and acceptable memory costs. With K-NN querying, the average overall ratio and rank-i ratio of the Hamming distance are accurate and ratios of the Euclidean distance are acceptable. It takes CPU time not I/O times to retrieve accurate and similar items and can deal with different data formats not only numerical values.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Novel structure of optical add/drop filters and multi-channel filter based on photonic crystal for using in optical telecommunication devices

In this paper, Using a 2D photonic crystal and a novel square ring resonator,several compact and simple structures have been introduced in the present paper toconstruct optical add/drop filters and multi-channel filter. The difference structures hasbeen designed and simulated by using the proposed square ring resonator and differentdropping waveguides. To do analyses, th...

متن کامل

Using Bloom Filters to Refine Web Search Results

Search engines have primarily focused on presenting the most relevant pages to the user quickly. A less well explored aspect of improving the search experience is to remove or group all near-duplicate documents in the results presented to the user. In this paper, we apply a Bloom filter based similarity detection technique to address this issue by refining the search results presented to the us...

متن کامل

Content-Based Routing of Path Queries in Peer-to-Peer Systems

Peer-to-peer (P2P) systems are gaining increasing popularity as a scalable means to share data among a large number of autonomous nodes. In this paper, we consider the case in which the nodes in a P2P system store XML documents. We propose a fully decentralized approach to the problem of routing path queries among the nodes of a P2P system based on maintaining specialized data structures, calle...

متن کامل

A Novel Architecture for Detecting Phishing Webpages using Cost-based Feature Selection

Phishing is one of the luring techniques used to exploit personal information. A phishing webpage detection system (PWDS) extracts features to determine whether it is a phishing webpage or not. Selecting appropriate features improves the performance of PWDS. Performance criteria are detection accuracy and system response time. The major time consumed by PWDS arises from feature extraction that ...

متن کامل

Bloofi: Multidimensional Bloom Filters

Bloom filters are probabilistic data structures commonly used for approximate membership problems in many areas of Computer Science (networking, distributed systems, databases, etc.). With the increase in data size and distribution of data, problems arise where a large number of Bloom filters are available, and all them need to be searched for potential matches. As an example, in a federated cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 2016  شماره 

صفحات  -

تاریخ انتشار 2016